{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Protein preparation with HTMD\n",
    "===============\n",
    "\n",
    "*Toni Giorgino*\n",
    "\n",
    "\n",
    "The system preparation phase is based on the PDB2PQR software. It \n",
    "includes the following steps (from the\n",
    "[PDB2PQR algorithm\n",
    "description](http://www.poissonboltzmann.org/docs/pdb2pqr-algorithm-description/)):\n",
    "\n",
    " * Assign titration states at the user-chosen pH;\n",
    " * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;\n",
    " * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);\n",
    " * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;\n",
    " * Optimizing all water hydrogens.\n",
    "\n",
    "The hydrogen bonding network calculations are performed by the\n",
    "[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa\n",
    "calculations are performed by the [PROPKA\n",
    "3.1](https://github.com/jensengroup/propka-3.1) software packages.\n",
    "Please see the copyright, license  and citation terms distributed with each.\n",
    "\n",
    "Note that this version was modified in order to use an \n",
    "externally-supplied propKa **3.1** (installed automatically via dependencies), whereas\n",
    "the original had propKa 3.0 *embedded*!\n",
    "\n",
    "The results of the function should be roughly equivalent of the system\n",
    "preparation wizard's preprocessing and optimization steps\n",
    "of Schrodinger's Maestro software."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usage\n",
    "----------\n",
    "\n",
    "The `proteinPrepare` function requires a molecule object, the protein to be prepared, as an argument, and returns the prepared system, also as a molecule object. Logging messages will provide information and warnings about the process.\n",
    "\n",
    "The full documentation is in the docstring, accessible via the usual Python help mechanism."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2016-06-14 16:57:31,532 - htmd.molecule.molecule - INFO - Using local copy for 3PTB: /home/toni/work/htmd/htmd/htmd/data/pdb/3ptb.pdb\n",
      "2016-06-14 16:57:31,714 - propka - INFO - No pdbfile provided\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Please cite. HTMD: High-Throughput Molecular Dynamics for Molecular Discovery, J. Chem. Theory Comput., 2016, 12 (4), pp 1845-1852. \n",
      "http://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049\n",
      "\n",
      "\n",
      "You are on the latest HTMD version (unpackaged).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2016-06-14 16:57:34,059 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA\n",
      "2016-06-14 16:57:34,059 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN\n",
      "2016-06-14 16:57:37,474 - htmd.builder.residuedata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.\n",
      "2016-06-14 16:57:37,476 - htmd.builder.residuedata - WARNING - Dubious protonation state:    HIS 57  A\n",
      "2016-06-14 16:57:37,477 - htmd.builder.residuedata - WARNING - Dubious protonation state:    GLU 70  A\n",
      "2016-06-14 16:57:37,477 - htmd.builder.residuedata - WARNING - Dubious protonation state:    N+ 16  A\n"
     ]
    }
   ],
   "source": [
    "from htmd import *\n",
    "tryp = Molecule('3PTB')\n",
    "tryp_op = proteinPrepare(tryp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The optimized molecule can be written and further manipulated as usual."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "tryp_op.write('systempreparation-test-main-ph-7.pdb')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Information about the prepared system\n",
    "\n",
    "A table of useful information, an object of type `ResidueData`, is available as a return argument if the `returnDetails` argument is set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2016-06-14 16:57:40,009 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CA\n",
      "2016-06-14 16:57:40,010 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BEN\n",
      "2016-06-14 16:57:43,494 - htmd.builder.residuedata - WARNING - Dubious protonation state: the pKa of 3 residues is within 1.0 units of pH 7.0.\n",
      "2016-06-14 16:57:43,496 - htmd.builder.residuedata - WARNING - Dubious protonation state:    HIS 57  A\n",
      "2016-06-14 16:57:43,497 - htmd.builder.residuedata - WARNING - Dubious protonation state:    GLU 70  A\n",
      "2016-06-14 16:57:43,497 - htmd.builder.residuedata - WARNING - Dubious protonation state:    N+ 16  A\n"
     ]
    }
   ],
   "source": [
    "tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `ResidueData` object carries a wealth of information on the preparation results. Most of it is accessible in the `data` property, which is a Pandas object. As such, it can be easily written as a spreadsheet in Excel or CSV format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "prepData.data.to_excel(\"tryp-report.xlsx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Membrane proteins\n",
    "\n",
    "Membrane-embedded proteins are in contact with an hydrophobic region which may alter pKa values for membrane-exposed residues ([Teixera et al.](http://dx.doi.org/10.1021/acs.jctc.5b01114)). Although the effect is not currently   taken into account quantitatively, if a `hydrophobicThickness` argument is provided, warnings will be generated for residues exposed to the lipid region.\n",
    "\n",
    "The following example shows the preparation of the mu opioid receptor, 4DKL. The pre-oriented structure is retrieved  from the OPM database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2016-06-14 16:57:45,286 - htmd.molecule.molecule - INFO - Removed 2546 atoms. 4836 atoms remaining in the molecule.\n",
      "2016-06-14 16:57:54,880 - htmd.builder.preparation - WARNING - The following residue has not been optimized: BF0\n",
      "2016-06-14 16:57:54,881 - htmd.builder.preparation - WARNING - The following residue has not been optimized: SO4\n",
      "2016-06-14 16:57:54,881 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CLR\n",
      "2016-06-14 16:57:54,882 - htmd.builder.preparation - WARNING - The following residue has not been optimized: MPG\n",
      "2016-06-14 16:57:54,882 - htmd.builder.preparation - WARNING - The following residue has not been optimized: 1PE\n",
      "2016-06-14 16:57:54,883 - htmd.builder.preparation - WARNING - The following residue has not been optimized: CL\n",
      "2016-06-14 16:58:03,854 - htmd.builder.residuedata - WARNING - Dubious protonation state: the pKa of 8 residues is within 1.0 units of pH 7.0.\n",
      "2016-06-14 16:58:03,857 - htmd.builder.residuedata - WARNING - Dubious protonation state:    ASP 114  A\n",
      "2016-06-14 16:58:03,857 - htmd.builder.residuedata - WARNING - Dubious protonation state:    HIS 223  A\n",
      "2016-06-14 16:58:03,857 - htmd.builder.residuedata - WARNING - Dubious protonation state:    LYS 233  A\n",
      "2016-06-14 16:58:03,858 - htmd.builder.residuedata - WARNING - Dubious protonation state:    ASP 114  B\n",
      "2016-06-14 16:58:03,858 - htmd.builder.residuedata - WARNING - Dubious protonation state:    HIS 223  B\n",
      "2016-06-14 16:58:03,859 - htmd.builder.residuedata - WARNING - Dubious protonation state:    LYS 233  B\n",
      "2016-06-14 16:58:03,859 - htmd.builder.residuedata - WARNING - Dubious protonation state:    N+ 65  A\n",
      "2016-06-14 16:58:03,860 - htmd.builder.residuedata - WARNING - Dubious protonation state:    N+ 65  B\n",
      "2016-06-14 16:58:03,864 - htmd.builder.residuedata - WARNING - Predictions for 30 residues may be incorrect because they are exposed to the membrane (-16.0<z<16.00 and buried<75.0%).\n"
     ]
    }
   ],
   "source": [
    "mor, thickness = htmd.util.opm(\"4dkl\")\n",
    "\n",
    "mor_opt, mor_data = proteinPrepare(mor, returnDetails=True,\n",
    "                                   hydrophobicThickness=thickness)\n",
    "\n",
    "exposedRes = mor_data.data.membraneExposed\n",
    "mor_data.data[exposedRes].to_excel(\"mor_exposed_residues.xlsx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Modified residue names\n",
    "----------------------\n",
    "\n",
    "The molecule produced by the preparation step has residue names modified\n",
    "according to their protonation.\n",
    "Later system-building functions assume these residue names. \n",
    "Note that support for alternative charge states varies between the  forcefields.\n",
    "\n",
    "Charge +1    |  Neutral   | Charge -1\n",
    "-------------|------------|----------\n",
    " -           |  ASH       | ASP\n",
    " -           |  CYS       | CYM\n",
    " -           |  GLH       | GLU\n",
    "HIP          |  HID/HIE   |  -\n",
    "LYS          |  LYN       |  -\n",
    " -           |  TYR       | TYM\n",
    "ARG          |  AR0       |  -\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Full help"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function proteinPrepare in module htmd.builder.preparation:\n",
      "\n",
      "proteinPrepare(mol_in, pH=7.0, verbose=0, returnDetails=False, hydrophobicThickness=None, holdSelection=None)\n",
      "    A system preparation wizard for HTMD.\n",
      "    \n",
      "    Returns a Molecule object, where residues have been renamed to follow\n",
      "    internal conventions on protonation (below). Coordinates are changed to\n",
      "    optimize the H-bonding network. This should be roughly equivalent to mdweb and Maestro's\n",
      "    preparation wizard.\n",
      "    \n",
      "    The following residue names are used in the returned molecule:\n",
      "    \n",
      "        ASH     Neutral ASP\n",
      "        CYX     SS-bonded CYS\n",
      "        CYM     Negative CYS\n",
      "        GLH     Neutral GLU\n",
      "        HIP     Positive HIS\n",
      "        HID     Neutral HIS, proton HD1 present\n",
      "        HIE     Neutral HIS, proton HE2 present\n",
      "        LYN     Neutral LYS\n",
      "        TYM     Negative TYR\n",
      "        AR0     Neutral ARG\n",
      "    \n",
      "    If hydrophobicThickness is set to a positive value 2*h, a warning is produced for titratable residues\n",
      "    having -h<z<h and are buried in the protein by less than 75%. The list of such residues can be accessed setting\n",
      "    returnDetails to True. Note that the heuristic for the detection of membrane-exposed residues is very crude;\n",
      "    the \"buried fraction\" computation (from propka) is approximate; also, in the presence of cavities,\n",
      "    residues may be solvent-exposed independently from their z location.\n",
      "    \n",
      "    \n",
      "    Notes\n",
      "    -----\n",
      "    In case of problems, exclude water and other dummy atoms.\n",
      "    \n",
      "    \n",
      "    Features\n",
      "    --------\n",
      "     - assign protonation states via propKa\n",
      "     - flip residues to optimize H-bonding network\n",
      "     - debump collisions\n",
      "     - fill-in missing atoms, e.g. hydrogen atoms\n",
      "    \n",
      "    \n",
      "    Parameters\n",
      "    ----------\n",
      "    mol_in : htmd.Molecule\n",
      "        the object to be optimized\n",
      "    pH : float\n",
      "        pH to decide titration\n",
      "    verbose : int\n",
      "        verbosity\n",
      "    returnDetails : bool\n",
      "        whether to return just the prepared Molecule (False, default) or a molecule *and* a ResidueInfo\n",
      "        object including computed properties\n",
      "    hydrophobicThickness : float\n",
      "        the thickness of the membrane in which the protein is embedded, or None if globular protein.\n",
      "        Used to provide a warning about membrane-exposed residues.\n",
      "    holdSelection : str\n",
      "        Atom selection to be excluded from optimization.\n",
      "        Only the carbon-alpha atom will be considered for the corresponding residue.\n",
      "    \n",
      "    \n",
      "    Returns\n",
      "    -------\n",
      "    mol_out : Molecule\n",
      "        the molecule titrated and optimized. The molecule object contains an additional attribute,\n",
      "    resData : ResidueData\n",
      "        a table of residues with the corresponding protonation states, pKas, and other information\n",
      "    \n",
      "    \n",
      "    Examples\n",
      "    --------\n",
      "    >>> tryp = Molecule('3PTB')\n",
      "    \n",
      "    >>> tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)\n",
      "    >>> tryp_op.write('proteinpreparation-test-main-ph-7.pdb')\n",
      "    >>> prepData.data.to_excel(\"/tmp/tryp-report.xlsx\")\n",
      "    >>> prepData\n",
      "    ResidueData object about 290 residues.\n",
      "    Unparametrized residue names: CA, BEN\n",
      "    Please find the full info in the .data property, e.g.:\n",
      "      resname  resid insertion chain       pKa protonation flipped     buried\n",
      "    0     ILE     16               A       NaN         ILE     NaN        NaN\n",
      "    1     VAL     17               A       NaN         VAL     NaN        NaN\n",
      "    2     GLY     18               A       NaN         GLY     NaN        NaN\n",
      "    3     GLY     19               A       NaN         GLY     NaN        NaN\n",
      "    4     TYR     20               A  9.590845         TYR     NaN  14.642857\n",
      "     . . .\n",
      "    >>> x_HIE91_ND1 = tryp_op.get(\"coords\",\"resid 91 and  name ND1\")\n",
      "    >>> x_SER93_H =   tryp_op.get(\"coords\",\"resid 93 and  name H\")\n",
      "    >>> len(x_SER93_H) == 3\n",
      "    True\n",
      "    >>> np.linalg.norm(x_HIE91_ND1-x_SER93_H) < 3\n",
      "    True\n",
      "    \n",
      "    >>> tryp_op = proteinPrepare(tryp, pH=1.0)\n",
      "    >>> tryp_op.write('proteinpreparation-test-main-ph-1.pdb')\n",
      "    \n",
      "    >>> tryp_op = proteinPrepare(tryp, pH=14.0)\n",
      "    >>> tryp_op.write('proteinpreparation-test-main-ph-14.pdb')\n",
      "    \n",
      "    >>> mol = Molecule(\"1r1j\")\n",
      "    >>> mo, prepData = proteinPrepare(mol, returnDetails=True)\n",
      "    >>> prepData.missedLigands\n",
      "    ['NAG', 'ZN', 'OIR']\n",
      "    \n",
      "    >>> his = prepData.data.resname == \"HIS\"\n",
      "    >>> prepData.data[his][[\"resid\",\"insertion\",\"chain\",\"resname\",\"protonation\"]]\n",
      "         resid insertion chain resname protonation\n",
      "    160    214               A     HIS         HID\n",
      "    163    217               A     HIS         HID\n",
      "    383    437               A     HIS         HID\n",
      "    529    583               A     HIS         HID\n",
      "    533    587               A     HIS         HIP\n",
      "    583    637               A     HIS         HID\n",
      "    627    681               A     HIS         HID\n",
      "    657    711               A     HIS         HIP\n",
      "    679    733               A     HIS         HID\n",
      "    \n",
      "    >>> mor = Molecule(\"4dkl\")\n",
      "    >>> mor.filter(\"protein and noh\")\n",
      "    >>> mor_opt, mor_data = proteinPrepare(mor, returnDetails=True,\n",
      "    ...                                    hydrophobicThickness=32.0)\n",
      "    >>> exposedRes = mor_data.data.membraneExposed\n",
      "    >>> mor_data.data[exposedRes].to_excel(\"/tmp/mor_exposed_residues.xlsx\")\n",
      "    \n",
      "    >>> im=Molecule(\"4bkj\")\n",
      "    >>> imo,imd=proteinPrepare(im,returnDetails=True)\n",
      "    >>> imd.data.to_excel(\"/tmp/imatinib_report.xlsx\")\n",
      "    \n",
      "    \n",
      "    See Also\n",
      "    --------\n",
      "    The ResidueData object.\n",
      "    \n",
      "    \n",
      "    Unsupported/To Do/To Check\n",
      "    --------------------------\n",
      "     - ligands\n",
      "     - termini\n",
      "     - force residues\n",
      "     - multiple chains\n",
      "     - nucleic acids\n",
      "     - coupled titrating residues\n",
      "     - Disulfide bridge detection (implemented but unused)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(proteinPrepare)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Acknowledgements and citations\n",
    "=========\n",
    "\n",
    "Please acknowledge your use of PDB2PQR by citing:\n",
    "\n",
    " *   Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA. PDB2PQR: Expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res, 35, W522-5, 2007. \n",
    " *   Dolinsky TJ, Nielsen JE, McCammon JA, Baker NA. PDB2PQR: an automated pipeline for the setup, execution, and analysis of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res, 32, W665-W667, 2004.\n",
    " \n",
    " \n",
    "Please acknowledge your use of PROPKA by citing:\n",
    "\n",
    " *   Sondergaard, Chresten R., Mats HM Olsson, Michal Rostkowski, and Jan H. Jensen. \"Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values.\" Journal of Chemical Theory and Computation 7, no. 7 (2011): 2284-2295.\n",
    " *   Olsson, Mats HM, Chresten R. Sondergaard, Michal Rostkowski, and Jan H. Jensen. \"PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions.\" Journal of Chemical Theory and Computation 7, no. 2 (2011): 525-537.\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3.0
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}